Deep learning is a type of machine learning that uses neural networks to essentially automate feature engineering by progressively extracting regular features from the data.
Simple, right?!
Each neuron is a linear combination of the inputs (and then a transformation) based on a given activation function.
If we have a binary outcome variable with a softmax (i.e., logistic) activation function, this neural network would be logistic regression.
What makes it a neural network is including multiple neurons (i.e., hidden units) and potentially multiple hidden layers (hidden just means they aren’t inputs or outputs).
This sequence of activation functions can approximate any function, essentially learning the feature engineering that results in better predictions.
Having more than one hidden layer is what makes a neural network deep.
Let’s visualize how this works.
Install the nnet package!
Load libraries and set a random seed.
# Load packages, set seed. library(tidyverse) library(tidymodels) set.seed(42)
# Import data, wrangle S1 into segment, coerce factors, and select predictors.
roomba_survey <- read_csv(here::here("Data", "roomba_survey.csv")) |>
rename(segment = S1) |>
mutate(
segment = case_when(
segment == 1 ~ "own",
segment == 3 ~ "shopping",
segment == 4 ~ "considering"
),
segment = factor(segment),
D2HomeType = factor(D2HomeType),
D3Neighborhood = factor(D3Neighborhood),
D4MaritalStatus = factor(D4MaritalStatus),
D6Education = factor(D6Education)
) |>
select(
segment, contains("RelatedBehaviors"), contains("ShoppingAttitudes"),
D1Gender, D2HomeType, D3Neighborhood, D4MaritalStatus, D6Education
)
Normalizing all of the predictors is common for neural networks. Why?
\(X^{norm} = \frac{X - \mu_X}{\sigma_X}\)
# Split data based on segment.
roomba_split <- initial_split(roomba_survey, prop = 0.75, strata = segment)
# Feature engineering.
roomba_recipe <- training(roomba_split) |>
recipe(
segment ~ .
) |>
step_dummy(all_nominal_predictors()) |>
step_zv(all_predictors()) |>
step_normalize(all_predictors())
There are many different types of neural networks. A common one is a multilayer perceptron, which has a single hidden layer.
# Set the model, engine, and mode.
nn_model_01 <- mlp() |>
set_engine(engine = "nnet") |>
set_mode("classification")
nn_model_01
## Single Layer Neural Network Model Specification (classification) ## ## Computational engine: nnet
Create a workflow that combines the recipe and model and fit it.
# Create a workflow. nn_wf_01 <- workflow() |> add_recipe(roomba_recipe) |> add_model(nn_model_01) # Fit the workflow. nn_fit_01 <- fit(nn_wf_01, data = training(roomba_split))
So far, we have repeated the same steps every time we want to evaluate a model.
test datatest dataWhat coding tool can we use to simplify this process a little bit?
tidy function, i.e., a function that can accept an un-quoted variable name as an argument place {{ }} around that variable name wherever it occurs in the function.fit_accuracy <- function(fit, testing_data, truth) {
###
# - fit: a fitted model
# - testing_data: test data for which predictions will be generated
# - truth: name of the target (Y) variable contained in testing_data
fit |>
predict(new_data = testing_data) |>
bind_cols(testing_data) |>
accuracy(truth = {{truth}}, estimate = .pred_class)
}
Let’s look at the accuracy on the testing data.
# Compute model accuracy. fit_accuracy(nn_fit_01, testing(roomba_split), segment)
## # A tibble: 1 × 3 ## .metric .estimator .estimate ## <chr> <chr> <dbl> ## 1 accuracy multiclass 0.595
There are a lot of potential hyperparameters to tune, but we’ll focus on three.
hidden_units the number of hidden units in the single layerepochs the number of training iterationspenalty a penalty term to help fight overfittingThe activation function is set based on the model mode. You can experiment with "relu" instead of the default of "linear" for regression and "softmax" for classification.
“Teaching, exhorting, and explaining - as important as they are - can never convey to an investigator, a child, a student, or a member a witness of the truthfulness of the restored gospel. Only as their faith initiates action and opens the pathway to the heart can the Holy Ghost deliver confirming witnesses. Missionaries, parents, teachers, and leaders obviously must learn to teach by the power of the Spirit. Of equal importance, however, is the responsibility they have to help others learn for themselves by faith.”
Elder Bednar, “Learning in the Lord’s Way”, October 2018.
# Use v-fold cross-validation based on segment.
roomba_cv <- vfold_cv(training(roomba_split), v = 10, strata = segment)
# Set the model, engine, and mode.
nn_model_02 <- mlp(hidden_units = tune(), epochs = tune(), penalty = tune()) |>
set_engine(engine = "nnet") |>
set_mode("classification")
# Update the workflow.
nn_wf_02 <- nn_wf_01 |>
update_model(nn_model_02)
# Tune the hyperparameters by using the cross-validation.
nn_tune <- nn_wf_02 |>
tune_grid(resamples = roomba_cv)
## Warning: package 'nnet' was built under R version 4.4.3
nn_tune |> collect_metrics(summarize = FALSE) |> filter(.metric == "accuracy") |> group_by(.config) |> summarize(avg_accuracy = mean(.estimate)) |> arrange(desc(avg_accuracy))
## # A tibble: 10 × 2 ## .config avg_accuracy ## <chr> <dbl> ## 1 Preprocessor1_Model02 0.620 ## 2 Preprocessor1_Model04 0.617 ## 3 Preprocessor1_Model10 0.609 ## 4 Preprocessor1_Model08 0.593 ## 5 Preprocessor1_Model05 0.585 ## 6 Preprocessor1_Model07 0.580 ## 7 Preprocessor1_Model06 0.571 ## 8 Preprocessor1_Model09 0.561 ## 9 Preprocessor1_Model01 0.560 ## 10 Preprocessor1_Model03 0.557
Select the best model and finalize our workflow.
# Select the best fitting model. nn_tune_best <- nn_tune |> select_best(metric = "accuracy") # Finalize the workflow. nn_wf_final <- nn_wf_02 |> finalize_workflow(nn_tune_best) nn_wf_final
## ══ Workflow ════════════════════════════════════════════════════════════════════ ## Preprocessor: Recipe ## Model: mlp() ## ## ── Preprocessor ──────────────────────────────────────────────────────────────── ## 3 Recipe Steps ## ## • step_dummy() ## • step_zv() ## • step_normalize() ## ## ── Model ─────────────────────────────────────────────────────────────────────── ## Single Layer Neural Network Model Specification (classification) ## ## Main Arguments: ## hidden_units = 2 ## penalty = 0.0110234294006162 ## epochs = 891 ## ## Computational engine: nnet
Now let’s fit on the entire training data.
# Fit the tuned workflow to the whole dataset. nn_fit_02 <- fit(nn_wf_final, data = training(roomba_split)) # Compute model accuracy. fit_accuracy(nn_fit_02, testing(roomba_split), segment)
## # A tibble: 1 × 3 ## .metric .estimator .estimate ## <chr> <chr> <dbl> ## 1 accuracy multiclass 0.631
Neural networks can be powerful prediction engines. However, they often require a lot of data and can be slow to train. As with every predictive model, there is no guarantee it will predict best for a given application.
While adding additional hidden units can help uncover finer structures in the data, adding additional hidden layers allows for increasingly nonlinear relationships between the predictors and the outcome.
However, additional hidden layers will require a neural network more sophisticated than a multilayer perceptron.
Summary
Next Time
Supplementary Material
Return to the set of models from the previous two exercises.